21 research outputs found
Causally-Inspired Generalizable Deep Learning Methods under Distribution Shifts
Deep learning methods achieved remarkable success in various areas of artificial intelligence, due to their powerful distribution-matching capabilities. However, these successes rely heavily on the i.i.d assumption, i.e., the data distributions in the training and test datasets are the same. In this way, current deep learning methods typically exhibit poor generalization under distribution shift, performing poorly on test data with a distribution that differs from the training data. This significantly hinders the application of deep learning methods to real-world scenarios, as the distribution of test data is not always the same as the training distribution in our rapidly evolving world.
This thesis aims to discuss how to construct generalizable deep learning methods under distribution shifts. To achieve this, the thesis first models one prediction task as a structural causal model (SCM) which establishes the relationship between variables using directed acyclic graphs. In an SCM, some variables are easily changed across domains while others are not. However, deep learning methods often unintentionally mix invariant variables with easily changed variables, and thus deviate the learned model from the true one, resulting in the poor generalization ability under distribution shift.
To remedy this issue, we propose specific algorithms to model such an invariant part of the SCM with deep learning methods, and experimentally show it is beneficial for the trained model to generalize well into different distributions of the same task. Last, we further propose to identify and model the variant information in the new test distribution so that we can fully adapt the trained deep learning model accordingly.
We show the method can be extended for several practical applications, such as classification under label shift, image translation under semantics shift, robotics control in dynamics generalization and generalizing large language models into visual question-answer tasks
Long Text Generation via Adversarial Training with Leaked Information
Automatically generating coherent and semantically meaningful text has many
applications in machine translation, dialogue systems, image captioning, etc.
Recently, by combining with policy gradient, Generative Adversarial Nets (GAN)
that use a discriminative model to guide the training of the generative model
as a reinforcement learning policy has shown promising results in text
generation. However, the scalar guiding signal is only available after the
entire text has been generated and lacks intermediate information about text
structure during the generative process. As such, it limits its success when
the length of the generated text samples is long (more than 20 words). In this
paper, we propose a new framework, called LeakGAN, to address the problem for
long text generation. We allow the discriminative net to leak its own
high-level extracted features to the generative net to further help the
guidance. The generator incorporates such informative signals into all
generation steps through an additional Manager module, which takes the
extracted features of current generated words and outputs a latent vector to
guide the Worker module for next-word generation. Our extensive experiments on
synthetic data and various real-world tasks with Turing test demonstrate that
LeakGAN is highly effective in long text generation and also improves the
performance in short text generation scenarios. More importantly, without any
supervision, LeakGAN would be able to implicitly learn sentence structures only
through the interaction between Manager and Worker.Comment: 14 pages, AAAI 201
Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model
Text-to-image generative models have attracted rising attention for flexible
image editing via user-specified descriptions. However, text descriptions alone
are not enough to elaborate the details of subjects, often compromising the
subjects' identity or requiring additional per-subject fine-tuning. We
introduce a new framework called \textit{Paste, Inpaint and Harmonize via
Denoising} (PhD), which leverages an exemplar image in addition to text
descriptions to specify user intentions. In the pasting step, an off-the-shelf
segmentation model is employed to identify a user-specified subject within an
exemplar image which is subsequently inserted into a background image to serve
as an initialization capturing both scene context and subject identity in one.
To guarantee the visual coherence of the generated or edited image, we
introduce an inpainting and harmonizing module to guide the pre-trained
diffusion model to seamlessly blend the inserted subject into the scene
naturally. As we keep the pre-trained diffusion model frozen, we preserve its
strong image synthesis ability and text-driven ability, thus achieving
high-quality results and flexible editing with diverse texts. In our
experiments, we apply PhD to both subject-driven image editing tasks and
explore text-driven scene generation given a reference subject. Both
quantitative and qualitative comparisons with baseline methods demonstrate that
our approach achieves state-of-the-art performance in both tasks. More
qualitative results can be found at
\url{https://sites.google.com/view/phd-demo-page}.Comment: 10 pages, 12 figure
Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4
Unlike perfect information games, where all elements are known to every
player, imperfect information games emulate the real-world complexities of
decision-making under uncertain or incomplete information. GPT-4, the recent
breakthrough in large language models (LLMs) trained on massive passive data,
is notable for its knowledge retrieval and reasoning abilities. This paper
delves into the applicability of GPT-4's learned knowledge for imperfect
information games. To achieve this, we introduce \textbf{Suspicion-Agent}, an
innovative agent that leverages GPT-4's capabilities for performing in
imperfect information games. With proper prompt engineering to achieve
different functions, Suspicion-Agent based on GPT-4 demonstrates remarkable
adaptability across a range of imperfect information card games. Importantly,
GPT-4 displays a strong high-order theory of mind (ToM) capacity, meaning it
can understand others and intentionally impact others' behavior. Leveraging
this, we design a planning strategy that enables GPT-4 to competently play
against different opponents, adapting its gameplay style as needed, while
requiring only the game rules and descriptions of observations as input. In the
experiments, we qualitatively showcase the capabilities of Suspicion-Agent
across three different imperfect information games and then quantitatively
evaluate it in Leduc Hold'em. The results show that Suspicion-Agent can
potentially outperform traditional algorithms designed for imperfect
information games, without any specialized training or examples. In order to
encourage and foster deeper insights within the community, we make our
game-related data publicly available
Texygen: A Benchmarking Platform for Text Generation Models
We introduce Texygen, a benchmarking platform to support research on
open-domain text generation models. Texygen has not only implemented a majority
of text generation models, but also covered a set of metrics that evaluate the
diversity, the quality and the consistency of the generated texts. The Texygen
platform could help standardize the research on text generation and facilitate
the sharing of fine-tuned open-source implementations among researchers for
their work. As a consequence, this would help in improving the reproductivity
and reliability of future research work in text generation.Comment: 4 page
GenORM: Generalizable One-shot Rope Manipulation with Parameter-Aware Policy
Due to the inherent uncertainty in their deformability during motion,
previous methods in rope manipulation often require hundreds of real-world
demonstrations to train a manipulation policy for each rope, even for simple
tasks such as rope goal reaching, which hinder their applications in our
ever-changing world. To address this issue, we introduce GenORM, a framework
that allows the manipulation policy to handle different deformable ropes with a
single real-world demonstration. To achieve this, we augment the policy by
conditioning it on deformable rope parameters and training it with a diverse
range of simulated deformable ropes so that the policy can adjust actions based
on different rope parameters. At the time of inference, given a new rope,
GenORM estimates the deformable rope parameters by minimizing the disparity
between the grid density of point clouds of real-world demonstrations and
simulations. With the help of a differentiable physics simulator, we require
only a single real-world demonstration. Empirical validations on both simulated
and real-world rope manipulation setups clearly show that our method can
manipulate different ropes with a single demonstration and significantly
outperforms the baseline in both environments (62% improvement in in-domain
ropes, and 15% improvement in out-of-distribution ropes in simulation, 26%
improvement in real-world), demonstrating the effectiveness of our approach in
one-shot rope manipulation
Ranking-Incentivized Quality Preserving Content Modification
The Web is a canonical example of a competitive retrieval setting where many
documents' authors consistently modify their documents to promote them in
rankings. We present an automatic method for quality-preserving modification of
document content -- i.e., maintaining content quality -- so that the document
is ranked higher for a query by a non-disclosed ranking function whose rankings
can be observed. The method replaces a passage in the document with some other
passage. To select the two passages, we use a learning-to-rank approach with a
bi-objective optimization criterion: rank promotion and content-quality
maintenance. We used the approach as a bot in content-based ranking
competitions. Analysis of the competitions demonstrates the merits of our
approach with respect to human content modifications in terms of rank
promotion, content-quality maintenance and relevance.Comment: 10 pages. 8 figures. 3 table
Weight Fused functional sliced average variance estimation
Communications in Statistics - Simulation and Computation5195000-500
DreamSparse: Escaping from Plato's Cave with 2D Diffusion Model Given Sparse Views
Synthesizing novel view images from a few views is a challenging but
practical problem. Existing methods often struggle with producing high-quality
results or necessitate per-object optimization in such few-view settings due to
the insufficient information provided. In this work, we explore leveraging the
strong 2D priors in pre-trained diffusion models for synthesizing novel view
images. 2D diffusion models, nevertheless, lack 3D awareness, leading to
distorted image synthesis and compromising the identity. To address these
problems, we propose DreamSparse, a framework that enables the frozen
pre-trained diffusion model to generate geometry and identity-consistent novel
view image. Specifically, DreamSparse incorporates a geometry module designed
to capture 3D features from sparse views as a 3D prior. Subsequently, a spatial
guidance model is introduced to convert these 3D feature maps into spatial
information for the generative process. This information is then used to guide
the pre-trained diffusion model, enabling it to generate geometrically
consistent images without tuning it. Leveraging the strong image priors in the
pre-trained diffusion models, DreamSparse is capable of synthesizing
high-quality novel views for both object and scene-level images and
generalising to open-set images. Experimental results demonstrate that our
framework can effectively synthesize novel view images from sparse views and
outperforms baselines in both trained and open-set category images. More
results can be found on our project page:
https://sites.google.com/view/dreamsparse-webpage